Multi-feature SEIR model for epidemic analysis and vaccine prioritization

The SEIR (susceptible-exposed-infected-recovered) model has become a valuable tool for studying infectious disease dynamics and predicting the spread of diseases, particularly concerning the COVID pandemic. However, existing models often oversimplify population characteristics and fail to account for differences in disease sensitivity and social contact rates that can vary significantly among individuals. To address these limitations, we have developed a new multi-feature SEIR model that considers the heterogeneity of health conditions (disease sensitivity) and social activity levels (contact rates) among populations affected by infectious diseases. Our model has been validated using the data of the confirmed COVID cases in Allegheny County (Pennsylvania, USA) and Hamilton County (Ohio, USA). The results demonstrate that our model outperforms traditional SEIR models regarding predictive accuracy. In addition, we have used our multi-feature SEIR model to propose and evaluate different vaccine prioritization strategies tailored to the characteristics of heterogeneous populations. We have formulated optimization problems to determine effective vaccine distribution strategies. We have designed extensive numerical simulations to compare vaccine distribution strategies in different scenarios. Overall, our multi-feature SEIR model enhances the existing models and provides a more accurate picture of disease dynamics. It can help to inform public health interventions during pandemics/epidemics.


Introduction
This paper develops a new SEIR model to facilitate the epidemic analysis and make appropriate decisions regarding vaccine prioritization and social distancing during pandemics/epidemics.SIR and its variant SEIR have been widely used to analyze the dynamics of epidemics.The classical SIR model classifies people into four states: susceptible (vulnerable to disease but not carrying virus), infected (symptomatic patient who can spread the virus to others), recovered (recovered from the disease), and dead.The SEIR (susceptible-exposed-infected-recovered) model is an extension of the SIR model when there is a non-trivial incubation period.It has one additional state, exposed, referring to people exposed to the virus but currently asymptomatic.
SIR and SEIR models have been broadly utilized to study the COVID pandemic.We summarize some of the existing works.The spread of COVID among communities using SIR model is investigated in [1].Besides, short-term predictions based on dynamic regional outbreaks are conducted with SEIR model in [2].To control the spread of COVID, a study utilizes the SIR model in lockdown policies to control the epidemic [3].Moreover, the effect of social distancing is evaluated via SEIR model in [4].Apart from these studies, efforts are also made to extend the SEIR model.SEAIR extracts the asymptomatic (A) state from the exposed state to further consider different severeness of symptoms [5].A discrete-time SEIR model with timevarying parameters is applied for interval prediction, and quarantine influences [6].Nonetheless, the above models assume homogeneous populations for all aspects, including sensitivity and contact rate.
Besides, the heterogeneous population has been previously studied, and in the following, we review some of the literature.The heterogeneous population is considered by incorporating populations from multiple regions in [2].However, it assumes a homogeneous population within one region and does not consider cross-regional communication.A SIR model considers heterogeneous social interactive levels of the population with homogeneous sensitivity parameters [7].Another extended SEIR model applies group-specific sensitivity parameters to classify different severeness of symptoms but uses identical contact rates [8].Furthermore, reliable estimation of the sensitivity/infection parameters is essential for the effectiveness of SEIR models.
The forecasting quality of SIR models can be affected by the choice of parameter values [9].The potential prediction bias exists with the use of growth rate in infection from early-days estimation [7].An evaluation of SIR on COVID finds its poor performance in long-term forecasting due to the parameters not aligning with long-term changes [10].To enhance the parameter reliability and predictive capabilities, a SEIQR (susceptible-exposed-infected-quarantined-recovered) model incorporates machine learning models to optimize the value of parameters [11].Besides, some papers study the influential factors in estimation to better predict the spread [12,13].
Additionally, since the surge of COVID, there have been works on SIR and SEIR models with intervention, including prioritizing vaccines.An adapted SEIR captures the impact of containment measures affecting the infection rate [14].Nonetheless, it does not discuss the modeling of vaccination.Regarding papers on prioritizing vaccines, they assume the distinction between susceptible and exposed people and vaccinate them separately [9,15].In practice, testing is required to distinguish between susceptible and exposed people [16].
Lastly, even though most SEIR models can use time-dependent parameters, they are deterministic within every single period for estimation.The population can change unexpectedly, which makes the pre-determined vaccination plan inefficient against the mass spread of disease.To ensure the efficiency of the vaccination plan against such uncertainty, a chance constraint formulation of the minimization is proposed [17].Another study ensures that the solution is effective with a high probability [18].
In this paper, we develop a new multi-feature SEIR model with an innovative way to estimate new infections by incorporating heterogeneous population characteristics (contact rates and sensitivities to disease).To ensure estimation reliability, our model uses new infection parameters relating to the form of contact.To demonstrate the prediction quality of our model, we compare the estimated infection cases with the confirmed infections from CDC and present an improvement to the classic SEIR model.Subsequently, we utilize our new model to formulate optimization problems to prioritize vaccines, and we evaluate vaccination strategies for different types of heterogeneous populations.Results show that a strategy similar to the COVID vaccination protocol is not always the most effective under severe situations.Our designed intuition-based vaccination strategy can be as effective as the heuristic solutions to the relevant vaccine optimization problem with the advantage of less computing time.Considering the possibility of uncertainties and underestimation in our vaccine optimization problem, we further develop a chance-constraint optimization problem (CCP) to prioritize vaccines.In some cases, the heuristic solution of CCP can outperform our designed intuitionbased vaccination strategies.
This paper is organized as follows.The materials and methods section consists of four subsections.We first review the classic SEIR model.Next, we introduce our new multi-feature SEIR model, where we integrate contact rate and sensitivity to estimate the population changes and allow heterogeneity in these features.Then, we discuss the modeling of vaccine prioritization using our multi-feature SEIR and how to prioritize vaccines within different population states.We formulate vaccine prioritization as an optimization problem.To account for underestimation and uncertainties, we further propose a chance-constraint version of the optimization via Conditional Value-at-Risk (CVaR).Lastly, we provide intuition-based vaccine prioritization strategies as well as optimization-based heuristic algorithms to solve the problem.
The result section consists of four subsections.First, we compare the estimated infections, both from our multi-feature SEIR and classic SEIR model, with the confirmed COVID infections in Allegheny County (Pennsylvania, USA) and Hamilton County (Ohio, USA).Then, we discuss the choice of parameters and performance metrics.Next, we evaluate intuition-based strategies under different severeness of the epidemic.Finally, we compare the performance of our intuition-based strategies with the heuristic solutions of our proposed optimization models.In the discussion section, we summarize and discuss the findings of each numerical study.

Materials and methods
This section discusses the multi-feature SEIR model with vaccination, and its content is structured as follows.In the first subsection SEIR model review, we introduce the classic SEIR model as the foundation of our analysis.In subsection multi-feature SEIR model, we present our new model that incorporates contact rates and sensitivity parameters.In subsection multifeature SEIR model with vaccine prioritization, we outline our approach to modeling vaccine prioritization.We formulate corresponding optimization problems to minimize the change in susceptible populations.In the last subsection intuition-based vaccination and heuristic solutions, we provide solution approaches to tackle vaccine prioritization effectively.

SEIR model review
The SEIR model is an extension of SIR when there is a non-trivial incubation period.It describes the dynamics of infectious diseases by dividing the population into the following different states, Susceptible, Exposed, Infected, Recovered, Cured, and Death [3,5,19]: Susceptible ðSÞ, uninfected but vulnerable individuals who never encounter or do not carry the virus; Exposed ðEÞ, infected but asymptomatic people who carry the virus and can infect others; Infected ðIÞ, patients from state E who developed symptoms; Recovered ðRÞ, other people from state E who recovered from the disease before becoming seriously ill; Cured ðCÞ, people previously infected but recovered from the disease; Dead ðDÞ, infected people who died due to the disease.Note that the difference between recovered and cured is that cured people can be observed since they are previously symptomatic.While recovered people is asymptomatic and therefore not distinguishable from susceptible and exposed people, unless via testing.The total population is the sum of the populations in all states, except for the population in D. The single-direction flow assumes that people can only be infected once.Parameters (rates of change for one predefined period) on the arrows are the sensitivity parameters corresponding to the state a person currently stays in.Effective contact rate/transmission rate (β) counts the average number of new infections caused by effective contact (meeting), where virus transmission happens between one infectious individual and one susceptible.Exposed-infected rate (γ) is the percentage of exposed people developing symptoms, estimated by the incubation period.Recovery rate for exposed (σ) is the percentage of exposed people recovering.It is estimated by the corresponding recovery time.Cured rate for infected (θ) is the percentage of infected people recovering.It is also estimated by the corresponding recovery time.Death rate (δ) is the percentage of infected people who die due to the disease.It is estimated by case fatality rate, the proportion of deaths compared to the total number of people diagnosed with the disease for a particular period.
The following system of equations summarizes the law of motion for the SEIR model for discrete time.Each term X t for X 2 {S, E, R, I, C, D} refers to the population of state X 2 fS; E; I; R; C; Dg at the beginning of the t-th period.
We use N t to denote the total alive population during t-th period.We assume that effective contact happens only between S and E people in (1a) in the rest of our discussion if infected individuals can be quarantined.A detailed explanation can also be found in [3,5].
The classic SEIR model allows both continuous and discrete change for prediction.Nonetheless, it assumes homogeneous individuals, regardless of different demographic features that can affect the epidemic.This also affects the analysis of testing or vaccination on different people.Therefore, we extend the SEIR by considering multiple features of the population to provide a more accurate prediction of infection.Moreover, we propose a new vaccination prioritization model for heterogeneous populations using the framework of multi-feature SEIR for effective control of the disease.

Multi-feature SEIR model
This subsection extends the classic SEIR model to consider different social activity levels (contact rate) and health conditions (sensitivity).Moreover, we modify the estimation of newly exposed people using new infection parameters, contact rate, and sensitivity, which can be different across the population.Existing models estimate new exposed people using β in (1a).This parameter is estimated by a state population and assumes uniform behavior [12].It is also difficult to distinguish different sensitivities using β [20].
We consider the extension of the classic SEIR model by adapting different contact rates and sensitivity (rates of change) [21].Fig 2 shows the dynamics of the new multi-feature SEIR model, where we use (i, j) division to distinguish different people with sensitivity s i and contact rate c j .Within each division, people are assumed to be identical.The term X ði;jÞ t stands for the people of state X in (i, j) division during t-th period.f(c, λ) plays a similar role to β, representing the rate of change for susceptible people.It depends on both contact rate c and probability of infection of close contact λ.The f(c, λ) function is the right-hand side of (2).Our new model considers a new infection parameter λ.The calculation of λ considering the form of contact, cough volume, distance, and other related factors has been discussed in [22].We assume uniform λ across the population since it only depends on the form of contact and is irrelevant to health condition [22,23].We allow λ to be time-varying due to the contact form can be frequently affected by the variants of the virus, people, and regulations [22].The rest of the notations are explained in Table 1.
Eq (2) is alternative to (1a) in classic SEIR.Instead of using β, we use infection probability for close contact, contact rate, and sensitivity to estimate the population change for state S and E. The population change for S ði;jÞ t regarding contacts with all exposed people is estimated as follows: To justify (2), we first consider the contact between S Note that this is an estimation for contacted susceptible people, which has been used in [24].E ðm;kÞ t is the number of exposed population in (m, k) division with sensitivity s m and contact rate c k during t-th period.c k is the average number of different people met by an exposed person in (m, k) division for a predefined period.Its value can be estimated via social network simulation [25][26][27].People in E ðm;kÞ t make c k E ðm;kÞ t number of contacts.Among these contacted people, approximately pðS ði;jÞ t Þ of them belong to state S ði;jÞ t .The pðS ði;jÞ t Þ denotes the proportion of S ði;jÞ t among the current total population (N t − I t ), assuming infected people in quarantine.Thus, we can calculate the newly exposed people in (i, j) division for t-th period, which is the number of people leaving S ði;jÞ t .It is estimated by the number of contact happening multiplied by infection probability λ: A negative sign is due to people leaving S ði;jÞ t .λ is the infection probability measuring the possibility of virus transmission between an exposed and a susceptible person [22].Eq (4) estimates new exposed in an analogous way to (1a).In (1a), E t S t estimates all possible contacts (I t is ignored by quarantine assumption).The term β � E t S t gives the number of newly exposed people who get virus-transmitted in effective contact.Similarly, Eq (4) estimates the contact number by CðS ði;jÞ t ; E ðm;kÞ t Þ.The λ calculates the average number of effective contact (virus transmitted to a susceptible person) happening per contact among CðS ði;jÞ t ; E ðm;kÞ t Þ.The summation is over sensitivity index m and contact index k, because the change for S ði;jÞ t is caused by contacts with exposed people from every (m, k) division.E k t , the population of exposed people with contact rate c k , is calculated by Eq (5), which is also suitable for other states and total population N: The basic reproduction number, R 0 (t), can be estimated following the rational of Eq (2).R 0 (t) quantifies the expected number of new cases generated by a single case at a given time t, where all individuals are susceptible to infection, with the assumption that no other individuals are infected or immunized [28,29].The R 0 (t) can be estimated by: R 0 ðtÞ ¼ The term P i;j l S ði;jÞ t N t À I t c k estimates the new cases resulting from a single exposed individual with a contact rate of c k .We take the average with respect to p c;k t , the proportion of people with contact rate c k at time t.
For population change of other states in (i, j) division, corresponding changes are made to the exposed state, while changes in other states remain the same as the classic SEIR model in (1c) to (1g), except for replacing X t by X ði;jÞ t :

Multi-feature SEIR model with vaccine prioritization
In this subsection, we utilize the multi-feature SEIR to model vaccine prioritization among different population groups/divisions.We assume that vaccinated people will no longer be infected, and the vaccine will take effect in the next coming period.Our model defines a new variable to measure the proportion of vaccinated people among asymptomatic people (susceptible, exposed, recovered).We give the final system of equations of our multi-feature SEIR with vaccine prioritization.Lastly, we define a chance-constraint optimization problem in response to the difference between reality and estimation.
Vaccine prioritization modeling.To model vaccination, we add a new state Immunized ðVÞ, to indicate immunized people after vaccination.A decision variable v ði;jÞ t is defined to measure the proportion of vaccinated people among asymptomatic people (susceptible, exposed, and recovered) in (i, j) division.We assume that asymptomatic people in the same division are equally likely to get vaccinated.The vaccination of infected and cured people is not considered in this paper, since their population is observable and much easier for planning.But it can be adjusted to our model by the cured rate for infected people (θ or θ i ).In the following, we discuss the population change of each state for (i, j) division.
The following equation explains the susceptible population change regarding contact with exposed people of all contact rates and sensitivity under vaccination: v ði;jÞ t is the vaccination coverage ratio for people in (i, j) division during the t-th period of time.DS ði;jÞ t represents the population of virus-transmitted, unvaccinated people, who will be in state E tþ1 for the next period.Consider the t-th period, for the people in state S, there are four possible changes to them.First, there are people not receiving the virus and remaining susceptible for t + 1.Second, there are people not getting virus transmitted and getting vaccinated, who will be in state V tþ1 for the next period.Moreover, there are virus-transmitted susceptible people, whose population is estimated by Eq (2).Some of them are vaccinated to be in state V tþ1 .The remaining unvaccinated people, whose population is ΔS t , will be in state E tþ1 .The population of all vaccinated susceptible people is estimated by Eq (8), which is part of V tþ1 : Using the ratio v ði;jÞ t , Eq (9) estimates DS ði;jÞ t , the population of new exposed people for the next period, who are unvaccinated and virus-transmitted S people: We assume vaccine becomes effective in the next period.Thus, the population S  As we consider all asymptomatic people, v ði;jÞ t is also applied for state E and R. Based on (1b), the change for population of E ði;jÞ t under vaccination is: The term ð1 À v ði;jÞ t Þs E i E ði;jÞ t represents the population becoming recovered from state E in Eq (10).We vaccinate v ði;jÞ t proportion of recovered people, leaving the remaining proportion in state R. Lastly, the effect of vaccine is also reflected in the change in the infected population.Based on Eq (1d), we have the following change for the infected population: We consider vaccinating people in states S, E, and R. Vaccination and other medical treatment of people in state I can be reflected in the cured rate for infected, θ i , and it is out of the scope of this paper.
With given population of all states during t-th period, for different sensitivity i = 1, � � �, M and different contact rate j = 1, � � �, K, our multi-feature SEIR model with vaccination gives the dynamics of population changes for each (i, j) division: Eq (13a) ensures the non-negativity of susceptible population.Since other sensitivity parameters are far less than 1, other states are guaranteed to be non-negative all the time in (13c)-(13h).Eq (13i) gives the cumulative vaccinated population.Eq (13j) calculates the population with same contact rate and total population for each state.
Optimization formulation.With a given number of vaccines for each period, we can formulate the vaccination prioritization problem as an optimization problem using the multi-feature SEIR model with vaccination.To decide the best practical vaccination strategy, we minimize the summation of DS ði;jÞ t from ( 9), the population of new exposed people over all i, j and all time t, subject to constraints on the amount of available vaccine and multi-feature SEIR model.This number is responsible for the latent infection, as well as all exposed, infected, and death.The optimal solution is a sequence of v ði;jÞ t for all t, i, j, deciding the vaccine coverage ratio for each population division in each period.For given population of each state and V min t ; V max t for all t, the optimization is as follows: gives the population of each (i, j) division in each state at the beginning, where p ði;jÞ 0 is the initial proportion of (i, j) division in total population of all states.Constraint (14d) represents the vaccination requirement for each period, with the minimum vaccination requirement V min t and maximum available vaccine V max t .It is estimated by p ði;jÞ 0 ¼ p s;i 0 � p c;j 0 , with the proportions defined in Table 1.Constraint (14e) and (14f) are the practical constraints on vaccine coverage ratio and population being non-negative.
Chance-constraint optimization.The previous discussion assumes DS ði;jÞ t following the estimation of Eq (9), resulting in a static model for each period.Since the actual change in population can deviate from our model estimation, a chance-constraint optimization problem is defined using Conditional Value-at-Risk (CVaR).First, we explain its intuition.Then, we propose corresponding constraints to the optimization.
In reality, the value of DS To ensure the effectiveness of F ði;jÞ t and our proposed solution under most situations, one approach is adding a probabilistic constraint [18]: Namely, with the probability at least α, we want our estimation F ði;jÞ t to be conservative (more than the actual amount), making our vaccination plan sufficient.The mathematical formulation of ( 15) is done via Value-at-Risk (VaR) [30]: where VaR α is defined as: VaR a ðD � S ði;jÞ t Þminfφ ði;jÞ t jPðD � S ði;jÞ VaR α refers to the minimum value of the D � S ði;jÞ t greater than our model estimation (failing our estimation), happening with probability α.The notation � ði;jÞ t represents a relevant value that can be compared with F ði;jÞ t .Nonetheless, to avoid the potential computationally tractability issues brought by Value at Risk [31], we consider another popular performance metric: where CVaR α is defined to be the conditional expectations in excess of VaR α : On account of our model deciding vaccination via Eq (9), chance constraint (18) enforces more vaccines to be planned.We minimize our estimation F ði;jÞ t , for its being the best thing we know about D � S ði;jÞ t .The resulting chance-constraint optimization problem becomes: Due to DS ði;jÞ t being stochastic, we generate a realized value D � S ði;jÞ t in Eq (20c).For the other future periods, we use DS ði;jÞ t ¼ F ði;jÞ t as shown in Eq (20d), meaning that we believe future DS ði;jÞ t following our estimation.This also creates the difference between actual and estimated value in state S, E, etc. Correspondingly, under such difference, (20h) is introduced to ensure efficiency.We do not consider the difference between DS ði;jÞ t and F ði;jÞ t for the future (t � 1), since they are unrealized.Thus, we still have DS ði;jÞ t ¼ F ði;jÞ t for t = 1, � � �, T − 1.However, in our heuristic solution approach that we discuss later, we can apply the Optimization for small T repeatedly, i.e., solve for T = 4 sequentially.This will continuously consider the difference between the actual and estimated value of DS ði;jÞ t and the population in all states for a future time.

Intuition-based vaccination and heuristic solutions
Solving the Mixed-Integer-Problem (MIP) reformulation of Optimization ( 14) and ( 20) is time-consuming.For many small instances with two different contact rates, two sensitivities, and for T = 5, it takes more than 12 hours to get the optimal solution.For efficiency, we propose intuition-based strategies and heuristic solutions to the optimization problem.First, we introduce some intuition-based strategies, prioritizing vaccines based on contact rate and sensitivity.Then, we discuss a heuristic solution, a modified greedy algorithm that sequentially solves the optimization problem in shorter periods.First, we consider three intuition-based vaccine prioritization strategies for different groups of people based on their contact rate and sensitivity.The first one is noted as C*S, vaccination considering contact rate and sensitivity simultaneously.We decide which (i, j) division gets vaccinated first, based on the value of c j � s i (contact rate times sensitivity).The larger it is, the higher priority the division has.We choose sensitivity s = γ, the exposed-infected rate, when prioritizing the vaccine.The second one is noted as S1C2, vaccination considering sensitivity as the priority and contact rate secondly if two people have the same sensitivity.This is the most related to the current protocol, where younger and older people with higher risk get vaccinated first.The third one is noted as C1S2, vaccination considering contact rate as the priority and sensitivity secondly if there is a tie in contact rate.We reverse the order to see if there is an improvement.
Additionally, we use a heuristic solution, a modified greedy algorithm to heuristically solve the optimization.We denote the solution of static Optimization (14) as "Static", and the solution of chance-constraint Optimization (20) with Eq (21) as "Stochastic".Each optimization is solved sequentially with a short time slot of T = T g periods.Take T g = 4 as an example, we solve each optimization for t = 0, 1, 2, 3 together to decide the arrangement for the first week.For the second week, we solve each optimization for t = 1, 2, 3, 4 together.

Results
We utilize the multi-feature SEIR model, Optimization (14) and (20) to conduct several numerical experiments evaluating vaccination prioritization strategies.Many of the parameters of our experiments are chosen based on the COVID epidemic/pandemic [3,5,19].
This section is organized as follows.In the first subsection comparison with actual confirmed cases, to show the usefulness of our model, we compare the confirmed COVID infections from CDC with the estimated infections using multi-feature SEIR and the classic SEIR model.Afterward, in subsection numerical settings and evaluation metric, we give a numerical choice for sensitivity, contact rate, population, etc., and the performance metric for vaccination strategy evaluation.In subsection comparison of intuition-based vaccination, to select the best intuition-based strategy, we evaluate the performance of different vaccination strategies and benchmark them under different situations.In the last subsection comparison of heuristic solutions for optimization problems, we compare the intuition-based strategies with the heuristic solutions of the optimization problems under a severe situation.

Comparison with actual confirmed cases
To validate our model, we leverage the COVID data for confirmed cases in Allegheny County, Pennsylvania, USA and Hamilton County, Ohio, USA sourced from the CDC [33] and the USAFACTS dataset [34].We then compare this data with the estimated infection cases derived from both the multi-feature SEIR and classic SEIR models.
Our choice for the starting time is approximately when the confirmed cases have reached a substantial level.The endpoint is selected to be approximately the end of the last wave preceding the widespread vaccination distribution.As a result, the time period considered for Allegheny County spans from early May 2020 to June 2021, while for Hamilton County, it spans from late March 2020 to the beginning of July 2021.In Fig 4, the confirmed case shows three waves of the pandemic in Allegheny County.To align with reality, we consider three stages for the spread different sub-population is assigned to each stage based on the total confirmed cases and related regulations.The first stage begins in late April 2020, when the spread of COVID was about to start again (new daily confirmed cases started to rise after the time of decline).The second stage begins in the middle of August when the new confirmed cases become stable.The last stage begins in late October 2020, when the new confirmed cases started to rise again.The beginning time of these stages is referred to historical data but can also be decided based on the medical prediction of the next coming wave.For each of them, we fit the parameters for the best estimation.
In For both datasets, we measure the accuracy of both SEIR models by the following estimation error �.Here, I k represents the number of week k infections.
As we observed, the prediction of multi-feature SEIR aligns with the actual data much more than the classic model.Classic SEIR does not accurately identify the pandemic pattern and predicts the peak much earlier, and our multi-feature SEIR predicts the patterns and peaks much more accurately.We also see that the estimated infection of multi-feature SEIR is slightly higher than the confirmed cases most of the time.This is because the confirmed cases only include the reported ones and underestimated the actual number.The number of total infections can be higher than the number of total confirmed infections [35].In addition, we quantified performance using the estimation error, �, by considering the estimated weekly infection population and confirmed weekly infections in Table 2.The findings indicate the superiority of our proposed multi-feature SEIR model.

Numerical settings and evaluation metric
Next, we introduce the settings of population parameters, as well as performance metrics in vaccination strategy evaluation.
Sensitivity (s).Sensitivity represents vulnerability, indicating how likely a person will transit to another disease state.We utilize a common measurement of sensitivity, the rate of transition between states, and these rates are selected based on the actual duration a person spends in a specific state [1][2][3].Different sensitivities are applied to different states a person  may occupy.For our simulation, we divide all sensitivities into two groups: one with higher sensitivity (greater vulnerability) and one with lower sensitivity.Table 3 provides the range of choices based on the duration data presented in [3].
Based on the table, we choose the value of high sensitivity to be s h 2 {0.2, 0.14}, and low sensitivity to be s l 2 {0.1, 0.07}.
Contact rate (c).For our simulation, we set two contact rate groups based on the simulations in [26].High contact rate is from {25, 20, 15, 10} and low contact rate from {15, 10, 5}.The contact rate of the first group is always higher than the second group.
Initial exposed population.To see how the initial exposed population affects the result, we set the initial exposed population to take up α proportion of the susceptible population, α 2 {0.1%, 0.2%, 0.5%, 1%}.The lower the value of α is, the less severe the outbreak will be.States other than susceptible and exposed are zero at the beginning.
Vaccine amount.The maximum amount of vaccines in one period is decided by V total /T c , where V total is the total amount of vaccines available during the whole time we consider, and T c represents the total time planned to vaccinate the whole population (vaccine is evenly distributed for each period).We estimate V total by the initial total population and consider multiple vaccine doses.We set T c = 100 weeks, representing nearly two years.
Parameter selection.For experiments, we choose the sensitivity based on the actual duration a person spends in a specific state [3].The contact rate can be chosen based on the particular social network.We chose the contact rate from the social network simulation results in [27].The proportion values, initial exposed population, and vaccine amount can be varied and tailored to fit specific scenarios and populations.For example, the initial exposed population can be available after observing a disease outbreak.
Performance metrics.The effectiveness of the vaccination strategy is measured in terms of infection population and cumulative death.To measure cumulative death, we use death proportion.It is the ratio of cumulative death to the total initial population.Besides, we compute the average loss ratio for each strategy over all non-winning cases.We define the loss ratio of the highest infection population as follows: Loss ratio of highest I t ¼ max 0�t�T fI t g of given strategy À max 0�t�T fI t g of the best strategy max 0�t�T fI t g of given strategy : We do the same to define the loss ratio of cumulative death.

Comparison of intuition-based vaccination
In this subsection, we conduct simulations using populations with constant parameters over time.Even though the reality is usually time-varying, we use static simulation to provide suggestions for a short period.We compare the strategies under 8000 different situations of population characteristics.Lastly, our study provides statistical support for the efficacy of S1C2 vaccination strategy.Static simulation for effectiveness comparison.As we discussed before, we consider the performance in two aspects, highest infection and cumulative death.We study the effectiveness of different vaccination strategies by the winning rate of the highest infection proportion and death proportion for each aspect.We also summarize the average loss ratio to analyze the gap compared to the best strategy.Through the simulations for 8000 situations of population characteristics, we observe that the effectiveness of strategy heavily depends on the severity of the epidemic/pandemic.Thus, we only compare the strategies under similar severeness.The result shows that S1C2 strategy performs the best in most cases, but C*S and C1S2 can be better under severe situations, where high contact rate and high sensitivity people take at least 50% of the total population.
Below we summarize the performance of each strategy based on the different severeness of the situation.The severeness is modeled by the initial proportion of the high contact rate population and high sensitivity population.Four situations are considered: 1) High-High refers to severe situations, where the first High refers to people with high sensitivity taking up the majority (>50% of the total population), and the second High refers to people with high contact rate taking up the majority of the total population; 2) Low-High refers to the situation where low sensitivity people take up the majority in terms of sensitivity, and high contact rate people is more than 50%; 3) High-Low refers to the situation where high sensitivity people is more than 50%, and low contact rate people is more than 50%; 4) Low-Low refers to the situation where both low sensitivity people and low contact rate people take up the majority of the total population.Each situation contains 2000 simulations (4 initial exposed proportions, 4 sensitivities, 5 contact rates, 5 sensitivity proportions, and 5 contact rate proportions).
In the following, we compare the winning rate and average loss ratio for each situation.The winning rate counts the percentage of winning in terms of the two metrics (highest infection proportion and death proportion) among 2000 cases.The average loss ratio is defined at the beginning of result section, indicating the difference to the best strategy.Tables 4 and 5 exhibit the winning rate of each strategy in terms of highest infection proportion and death proportion, respectively.Tables 6 and 7 shows the average loss in terms of highest infection and cumulative death population, respectively.Strategy C_only and Random are omitted for they do not win under any situation.
In Table 4, each percentage represents the winning chance of having the lowest value in the highest infection proportion among 2000 simulated cases.S1C2 performs the best in all situations, winning more than 97% of cases in terms of the highest infection proportion.The hundred percent under High-High situation in the first row of Table 4 means that all three For cases where a strategy does not win in the highest infection proportion, we calculate the average loss ratio to investigate the difference with the best strategy in Table 6.The average is taken over all non-winning cases among 2000 simulated cases.Most percentages are less than 1%, showing a small loss to the best strategy.This also shows that a low winning rate does not necessarily mean poor performance.Hence, S1C2 is still the most reliable strategy, with a high winning chance and small average loss when it is not the best strategy.
We further evaluate the performance in terms of cumulative death in Tables 5 and 7, which is also proportional to the cumulative number of infections.
In Table 5, each percentage represents the winning probability of having the lowest value in death proportion among 2000 simulated cases.Under severe situations (High-High), C1S2 performs the best, and C*S is also better than S1C2.In other situations, the S1C2 strategy is the best.When the situation is getting less severe (moving vertically along the situation column), S1C2 has an increasing winning chance.This indicates the superiority of S1C2 in unsevere situations, where low sensitivity or low contact rate people take the majority.
For cases where a strategy does not win in terms of death proportion, we calculate the average loss ratio to investigate its difference to the best strategy in Table 7.The average is taken over all non-winning cases among 2000 simulated cases.Similarly, most percentages are less than 1%, showing a small loss to the best strategy.Note that under the High-High situation, the average loss for S1C2 is only 0.07%.In conclusion, C1S2 is preferred under severe (High-High) situations.In circumstances where we do not know the severity of the situation, S1C2 is suggested because its difference to the best strategy is marginally small under unfavorable situations.
Static simulation for severe situation.In Table 6, each percentage represents the winning probability of having the lowest value in death proportion among 2000 simulated cases.Under severe situations (High-High), C1S2 performs the best, and C*S is also better than S1C2.In other situations, the S1C2 strategy is the best.When the situation is getting less severe (moving vertically along the situation column), S1C2 has an increasing winning chance.This indicates the superiority of S1C2 in unsevere situations, where people with low sensitivity or low contact rates take the majority.
For cases where a strategy does not win in terms of death proportion, we calculate the average loss ratio to investigate its difference to the best strategy in Table 7.The average is taken over all non-winning cases among 2000 simulated cases.Similarly, most percentages are less than 1%, showing a small loss to the best strategy.Note that under the High-High situation, the average loss for S1C2 is only 0.07%.In conclusion, C1S2 is preferred under severe (High-High) situations.In circumstances where we do not know the severity of the situation, S1C2 is suggested because its difference from the best strategy is marginally small under unfavorable situations.
In Tables 8 and 9, we have four situations, and each rate is computed from 500 cases.All four situations have a similar pattern to the High-High situation in Tables 6 and 7.This indicates that the initial proportion of population does not affect the performance of a strategy.In Tables 10 and 11, for each sensitivity situations, we consider 500 cases to compute each percentage.C1S2 wins the majority of the time.S1C2 has its highest winning chance with sensitivity γ = (0.14, 0.1), and it is largely different from other cases.The total winning rate of S1C2 and C1S2 is 100%.In Tables 12 and 13, for each contact rate cases, we have 400 cases to compute each percentage.C1S2 has the highest winning rate, and the total winning rate of S1C2 and C1S2 is 100% as well.Through these comparisons, we witness that the initial proportion of population does not have much impact on the effectiveness of intuition-based strategies.In contrast, sensitivity and contact rate have more influence.Meanwhile, C1S2 works the best under severe situations, regardless of different population features.

Comparison of heuristic solutions for optimization problems
In this subsection, we evaluate the performance of heuristic solutions and intuition-based strategies, considering the uncertainty of disease transmission (influence on DS ði;jÞ t ).First, we give the distribution of DS ði;jÞ t .Next, simulations are conducted, and the performance of all solutions is compared.Furthermore, the results highlight the robustness of our chance-constraint formulation, affirming its effectiveness in scenarios with uncertainty.
In the following numerical study, we define the distribution of DS ði;jÞ t and f D ðDS ði;jÞ t Þ as discrete distribution in Table 14, with given support centered around our estimation F ði;jÞ t : The discrete distribution make the integral and expectation in Eq (21) easy to compute.Note that for the first period of Optimization ( 14) and (20), we randomly generate a value for D � S ði;jÞ t for each division.Since we have 4 divisions in the experiment, they are generated independently.Thus, our estimation may underestimate the real situation for some (i, j) divisions, but is effective for other divisions.We use heuristic solutions for Optimization (14) and (20).Solution of static Optimization ( 14) is noted as "Static".Solution of Optimization (20)  Comparison between intuition-based and heuristic solutions.Lastly, we compare intuition-based strategies and heuristic solutions under one severe situation.To measure the performance of each strategy, we consider the highest exposed, infected population over time and the cumulative death as the performance metric.We also take the average from 20 simulations for each situation.We set α = 0.95 in Constraint (20h).
We find that both heuristic solutions are better than intuition-based ones.Moreover, with the uncertainty in DS ði;jÞ t , the Stochastic solution from Optimization ( 20) is more efficient than the Static solution from Optimization (14).However, in some scenarios, intuition-based strategies can be as effective as the modified greedy algorithm and is less time-consuming.vaccination strategies (C*S, S1C2, C1S2) and heuristic solutions (Static and Stochastic, noted as CCP for short).For Static and Stochastic (CCP) solutions, we approximated the optimal solution by modified greedy solution with T g = 4, which sequentially solves the optimization for T = 4 periods (one month).The solutions are provided by the Gurobi solver.Some of the performances are listed in Tables 15 to 17.
All strategies perform at a similar level in terms of the exposed population, infection, and cumulative death.We only show the result of situation 3, since performance in other situations is quite the same.To find the best strategy against the uncertainty in DS ði;jÞ t , we further summarize the average performance metric in Tables 15 to 17 from 20 simulations for each strategy.
Situation 1 is static (no stochasticity in DS ði;jÞ t ), so Static and Stochastic have the same performance, superior to all strategies.In Situation 2 and 3, due to the uncertainty in DS ði;jÞ t , Static solution from Optimization (14) always performs worse than the Stochastic solution from Optimization (20).Nonetheless, two heuristic solutions have the lowest highest exposed population in all situations compared to intuition-based ones.
Since the objective function is minimizing all F ði;jÞ t (our estimation of the increase in exposed population), intuition-based strategies have a better performance in terms of infected and death populations.Besides, their performance in the exposed population is not far from heuristic solutions.More importantly, intuition-based strategies are much faster than solving an optimization problem.
Fig 1 shows how population changes in the SEIR model, with different rates of change from one state to another.

Fig 1 .
Fig 1.The classic SEIR model.Each state are noted in hollow letters.Their corresponding rate of change to the next state are marked on the arrow.https://doi.org/10.1371/journal.pone.0298932.g001

Fig 3
illustrates the distinction of non-virus-transmitted people, virus-transmitted and vaccinated people, virus-transmitted and unvaccinated people.

Fig 3 .
Fig 3. Population changes for S t population.All four kinds of people are susceptible at time t.White people have no changes and remain susceptible for time t + 1. Green people get vaccinated during this time and are not exposed to the virus.Red people get virustransmitted from other virus carriers and do not receive vaccination.Green-red people get vaccinated and get virus-transmitted.But they are treated as vaccinated people with immunity, and will not be counted as exposed for time t + 1. https://doi.org/10.1371/journal.pone.0298932.g003

Fig 5 ,
the confirmed cases exhibit two distinct waves of the pandemic in Hamilton County.We divide the pandemic into two stages, each with different sub-populations assigned to it.The first stage commenced in late May 2020, marked by the increasing severity of COVID spread.The second stage began in July when a new outbreak started.

Fig 4 .
Fig 4. Weekly COVID confirmed and estimated cases of new infection in Allegheny County.The vertical axis is the infected population.We compare the estimation of the infected population using historical data among the classic SEIR model, our proposed multi-feature SEIR, and actual confirmed infection cases.The observation period concludes on June 30th, 2021.https://doi.org/10.1371/journal.pone.0298932.g004

Fig 5 .
Fig 5. Weekly COVID confirmed and estimated cases of new infection in Hamilton County.The vertical axis is the infected population.We compare the estimates of the infected population using the classic SEIR model, our proposed multi-feature SEIR model, and the actual confirmed infection cases.The observation period concludes on July 5th, 2021.https://doi.org/10.1371/journal.pone.0298932.g005

Fig 6 .
Fig 6.Population change of exposed, infected, and dead state in situation 3 using intuition-based strategies and approximated optimal strategies.The population changes under given parameters and different vaccination strategies.All strategies perform similarly.Situation 1 and 2 present observably indifferent results, so only Situation 3 is presented.https://doi.org/10.1371/journal.pone.0298932.g006

Table 1 . Notations for multi-feature SEIR model.
Population division with s i (or s m ) and c j (or c k ).
For the remaining discussion, we must distinguish the actual and estimated value of DS ði;jÞ t .Denote the actual value of DS ði;jÞ , we underestimate the situation, as well as the amount of vaccine needed, making the solution provided by Optimization (14) inefficient.

Table 13 . Average loss ratio in cumulative death of each strategy under High-High situations.
https://doi.org/10.1371/journal.pone.0298932.t013 (21) Eq(21)is called "Stochastic".For details of the heuristic solution, please refer to the section on intuition-based vaccination and heuristic solutions.Table14for the first current period.For the other future periods, we use DS ði;jÞ for the first period, regardless of the realized value being different from F ði;jÞ t .While the Stochastic solution is aware of such difference, with an extra Constraint (20h).

Table 14 .
Example of discrete distributions for DS ði;jÞ t .